Skip to content

Conversation

@ZX-ModelCloud
Copy link

No description provided.

@ZX-ModelCloud ZX-ModelCloud changed the title add meta info Add Meta Dec 4, 2024
@ZX-ModelCloud ZX-ModelCloud changed the title Add Meta Add quantize_config.meta property Dec 4, 2024
@ZX-ModelCloud ZX-ModelCloud changed the title Add quantize_config.meta property Fix optimum compat Dec 5, 2024
@jiqing-feng jiqing-feng merged commit 5979473 into jiqing-feng:gptq Dec 5, 2024
jiqing-feng added a commit that referenced this pull request Dec 23, 2024
* align gptq check to transformers for supporting cpu

* fix comment

* gptqmodel

Signed-off-by: jiqing-feng <[email protected]>

* compatible with auto-gptq

Signed-off-by: jiqing-feng <[email protected]>

* fix compatible with auto-gptq

Signed-off-by: jiqing-feng <[email protected]>

* fix compatible with auto-gptq linear

Signed-off-by: jiqing-feng <[email protected]>

* revert unrelated changes

Signed-off-by: jiqing-feng <[email protected]>

* gptqmodel need use checkpoint_format  (#1)

* need checkpoint_format

* default value of checkpoint_format is gptq

* fix quantize

* fix quantize

* fix quantize

* Update quantizer.py

* need convert to v1 before gptqmodel save

* back checkpoint_format to gptq after convert

* cleanup code

* sym=False is not supported with auto-gptq

* add comments

* cleanup code

* Update quantizer.py

* always convert v2 to v1 if checkpoint_format = "gptq"

* Update quantizer.py

---------

Co-authored-by: ZX-ModelCloud <[email protected]>
Co-authored-by: Qubitium-ModelCloud <[email protected]>

* Mod backend code (#2)

* keep gptq_v2 if sym is false

* use hf_convert_gptq_v1_to_v2_format, hf_convert_gptq_v2_to_v1_format, and hf_gptqmodel_post_init

* no need check backend

* use device_map

* cleanup

* Update quantizer.py

* move import

---------

Co-authored-by: Qubitium-ModelCloud <[email protected]>

* fix format and log

Signed-off-by: jiqing-feng <[email protected]>

* fix version check

Signed-off-by: jiqing-feng <[email protected]>

* enable gptqmodel tests

Signed-off-by: jiqing-feng <[email protected]>

* update check quant type

Signed-off-by: jiqing-feng <[email protected]>

* Fix optimum compat (#3)

* add meta info

* cleanup

* cleanup

* The value of quantizer should be an array

* Update quantizer.py

* If is_auto_gptq_available() also writes "auto_gptq:version" to "quantizer"

* If is_auto_gptq_available() also writes "auto_gptq:version" to "quantizer"

* Update quantizer.py

* cleanup

* comment on meta

* hf_select_quant_linear pass checkpoint_format

* add todo fix

* move convert code to quantizer.save()

* Update quantizer.py

* Optimize hf_convert_gptq_v2_to_v1_format()

* Optimize hf_convert_gptq_v1_to_v2_format()

* fix GPTQTestCUDA

* hf_select_quant_linear() always set pack=True

* gptqmodel.hf_select_quant_linear() now does not select ExllamaV2

* gptqmodel.hf_select_quant_linear() now does not select ExllamaV2

* GPTQQuantizer add backend

* lower checkpoint_format and backend

* cleanup

* move backend to bottom

* no need to check gptqmodel version for ipex support

* Update import_utils.py

* Update quantizer.py

* fix UnboundLocalError: cannot access local variable 'version' where it is not associated with a value

* make version var short

* Update import_utils.py

* fix unittest

* use assertLessEqual

---------

Co-authored-by: Qubitium-ModelCloud <[email protected]>
Co-authored-by: LRL <[email protected]>

* fix format and convert v2 to v1

Signed-off-by: jiqing-feng <[email protected]>

* [Fix] all tensors not same device (#5)

* fix device error

* update gptqmodel version

* fix test

* fix format

Signed-off-by: jiqing-feng <[email protected]>

* add gptqmodel tests which contains cpu

Signed-off-by: jiqing-feng <[email protected]>

* fix all auto-gptq tests

Signed-off-by: jiqing-feng <[email protected]>

* revert tests

Signed-off-by: jiqing-feng <[email protected]>

* rm gptqmodel yaml

Signed-off-by: jiqing-feng <[email protected]>

* fix comment

Signed-off-by: jiqing-feng <[email protected]>

* enable real cpu tests by fp32

Signed-off-by: jiqing-feng <[email protected]>

* fix test model name

Signed-off-by: jiqing-feng <[email protected]>

* keep the original device setting when using auto-gptq

Signed-off-by: jiqing-feng <[email protected]>

* Update optimum/gptq/quantizer.py

Co-authored-by: Ilyas Moutawwakil <[email protected]>

* Update optimum/gptq/quantizer.py

Co-authored-by: Ilyas Moutawwakil <[email protected]>

---------

Signed-off-by: jiqing-feng <[email protected]>
Co-authored-by: LRL-ModelCloud <[email protected]>
Co-authored-by: ZX-ModelCloud <[email protected]>
Co-authored-by: Qubitium-ModelCloud <[email protected]>
Co-authored-by: ZX-ModelCloud <[email protected]>
Co-authored-by: LRL <[email protected]>
Co-authored-by: Ilyas Moutawwakil <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants